4 research outputs found
An Asymptotically Optimal Algorithm for Maximum Matching in Dynamic Streams
We present an algorithm for the maximum matching problem in dynamic
(insertion-deletions) streams with *asymptotically optimal* space complexity:
for any -vertex graph, our algorithm with high probability outputs an
-approximate matching in a single pass using bits of
space.
A long line of work on the dynamic streaming matching problem has reduced the
gap between space upper and lower bounds first to factors
[Assadi-Khanna-Li-Yaroslavtsev; SODA 2016] and subsequently to
factors [Dark-Konrad; CCC 2020]. Our upper bound now
matches the Dark-Konrad lower bound up to factors, thus completing this
research direction.
Our approach consists of two main steps: we first (provably) identify a
family of graphs, similar to the instances used in prior work to establish the
lower bounds for this problem, as the only "hard" instances to focus on. These
graphs include an induced subgraph which is both sparse and contains a large
matching. We then design a dynamic streaming algorithm for this family of
graphs which is more efficient than prior work. The key to this efficiency is a
novel sketching method, which bypasses the typical loss of
-factors in space compared to standard -sampling
primitives, and can be of independent interest in designing optimal algorithms
for other streaming problems.Comment: Full version of the paper accepted to ITCS 2022. 42 pages, 5 Figure
Tight Bounds for Vertex Connectivity in Dynamic Streams
We present a streaming algorithm for the vertex connectivity problem in
dynamic streams with a (nearly) optimal space bound: for any -vertex graph
and any integer , our algorithm with high probability outputs
whether or not is -vertex-connected in a single pass using
space.
Our upper bound matches the known lower bound for this problem
even in insertion-only streams -- which we extend to multi-pass algorithms in
this paper -- and closes one of the last remaining gaps in our understanding of
dynamic versus insertion-only streams. Our result is obtained via a novel
analysis of the previous best dynamic streaming algorithm of Guha, McGregor,
and Tench [PODS 2015] who obtained an space algorithm
for this problem. This also gives a model-independent algorithm for computing a
"certificate" of -vertex-connectivity as a union of spanning
forests, each on a random subset of vertices, which may be of
independent interest.Comment: Full version of the paper accepted to SOSA 2023. 15 pages, 3 Figure
Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs
Estimating quantiles, like the median or percentiles, is a fundamental task
in data mining and data science. A (streaming) quantile summary is a data
structure that can process a set S of n elements in a streaming fashion and at
the end, for any phi in (0,1], return a phi-quantile of S up to an eps error,
i.e., return a phi'-quantile with phi'=phi +- eps. We are particularly
interested in comparison-based summaries that only compare elements of the
universe under a total ordering and are otherwise completely oblivious of the
universe. The best known deterministic quantile summary is the 20-year old
Greenwald-Khanna (GK) summary that uses O((1/eps) log(eps n)) space
[SIGMOD'01]. This bound was recently proved to be optimal for all deterministic
comparison-based summaries by Cormode and Vesle\'y [PODS'20].
In this paper, we study weighted quantiles, a generalization of the quantiles
problem, where each element arrives with a positive integer weight which
denotes the number of copies of that element being inserted. The only known
method of handling weighted inputs via GK summaries is the naive approach of
breaking each weighted element into multiple unweighted items and feeding them
one by one to the summary, which results in a prohibitively large update time
(proportional to the maximum weight of input elements).
We give the first non-trivial extension of GK summaries for weighted inputs
and show that it takes O((1/eps) log(eps n)) space and O(log(1/eps)+ log
log(eps n)) update time per element to process a stream of length n (under some
quite mild assumptions on the range of weights and eps). En route to this, we
also simplify the original GK summaries for unweighted quantiles.Comment: 33 pages, 7 figures, International Conference on Database Theory 202